Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
نویسندگان
چکیده
Self-assembling peptide nanostructures have been shown to be of great importance in nature and presented many promising applications, for example, medicine as drug-delivery vehicles, biosensors, antivirals. Being very candidates the growing field bottom-up manufacture functional nanomaterials, previous work (Frederix, et al. 2011 2015) has screened all possible amino acid combinations di- tripeptides search such materials. However, enormous complexity variety linear 20 acids make exhaustive simulation tetrapeptides above infeasible. Therefore, we developed an active machine-learning method (also known “iterative learning” “evolutionary method”) which leverages a lower-resolution data set encompassing whole space just-in-time high-resolution further analyzes those target peptides selected by model. This model uses newly generated upon each iteration improve both lower- higher-resolution models ideal candidates. Curation is explored control candidates, based on criteria log P. A major aim this produce best results least computationally demanding way. broadly applicable other spaces with minor changes algorithm, allowing its use areas research.
منابع مشابه
Distributed Learning on Very Large Data Sets
One approach to learning from intractably large data sets is to utilize all the training data by learning models on tractably sized subsets of the data. The subsets of data may be disjoint or partially overlapping. The individual learned models may be combined into a single model or a voting approach may be used to combine the classi cations of a set of models. An approach to learning models in...
متن کاملSoft Clustering for Very Large Data Sets
Clustering is regarded as one of the significant task in data mining and has been widely used in very large data sets. Soft clustering is unlike the traditional hard clustering which allows one data belong to two or more clusters. Soft clustering such as fuzzy c-means and rough k-means have been proposed and successfully applied to deal with uncertainty and vagueness. However, the influx of ver...
متن کاملDecision Tree Learning on Very Large Data Sets
Consider a labeled data set of 1 terabyte in size. A salient subset might depend upon the users interests. Clearly, browsing such a large data set to find interesting areas would be very time consuming. An intelligent agent which, for a given class of user, could provide hints on areas of the data that might interest the user would be very useful. Given large data sets having categories of sali...
متن کاملEnsembles of data-reduction-based classifiers for distributed learning from very large data sets
An ensemble of classifiers is a set of classification models and a method to combine their predictions into a joint decision. They were primarily devised to improve classification accuracies over individual classifiers. However, the growing need for learning from very large data sets has opened new application areas for this strategy. According to this approach, new ensembles of classifiers hav...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Chemical Theory and Computation
سال: 2021
ISSN: ['1549-9618', '1549-9626']
DOI: https://doi.org/10.1021/acs.jctc.1c00159